Model Selection

High-Resolution Image Processing

# High-Resolution Image Processing

DeepEyes is a vision-language model that encourages 'thinking with images' through reinforcement learning. It can directly integrate visual information into the reasoning chain and performs excellently in image-text processing tasks.

Transformers English

Upernet Swin Large

UPerNet semantic segmentation model based on Swin Transformer architecture, suitable for high-precision image segmentation tasks

Image Segmentation

Upernet Swin Small

UPerNet semantic segmentation model based on Swin Transformer small architecture, suitable for scene parsing tasks like ADE20K

Image Segmentation

Upernet Swin Tiny

UPerNet is a semantic segmentation model based on the ConvNeXt-Tiny architecture, suitable for image segmentation tasks.

Image Segmentation

Segformer B5 Finetuned Coralscapes 1024 1024

SegFormer model optimized for coral reef semantic segmentation tasks, fine-tuned on the Coralscapes dataset at 1024x1024 resolution

Image Segmentation

Segformer B2 Finetuned Coralscapes 1024 1024

This is a semantic segmentation model based on the SegFormer architecture, specifically optimized for coral reef ecosystem image segmentation tasks and fine-tuned on the Coralscapes dataset.

Image Segmentation

Vit So400m Patch14 Siglip Gap 448.pali Mix

A vision-language model based on the SigLIP image encoder, utilizing global average pooling, suitable for multimodal tasks.

Segformer B3 1024x1024 City 160k

A semantic segmentation model based on the Segformer architecture, optimized for the Cityscapes dataset

Image Segmentation

Segformer B0 1024x1024 City 160k

A lightweight semantic segmentation model based on Segformer architecture, pre-trained on the Cityscapes dataset

Image Segmentation

Segformer B2 1024x1024 City 160k

A semantic segmentation model based on the Segformer architecture, specifically optimized for the Cityscapes dataset

Image Segmentation

Segformer B1 512x512 Ade 160k

PyTorch-based Segformer model for semantic segmentation tasks, pre-trained on the ADE20K dataset

Image Segmentation

Mplug Owl3 7B 241101

mPLUG-Owl3 is an advanced multimodal large language model that focuses on solving the problem of long image sequence understanding. It significantly improves the processing speed and sequence length support through the hyper attention mechanism.

Safetensors English

Beit Base Patch16 384.in1k Ft Fungitastic 384

A Danish fungi classification model based on the BEiT architecture, specifically designed for identifying and classifying fungal species.

Image Classification

Llava Jp 1.3b V1.1

LLaVA-JP is a multimodal vision-language model that supports Japanese, capable of understanding and generating descriptions and dialogues about input images.

Transformers Japanese

Vitamin XL 256px

ViTamin-XL-256px is a vision-language model based on the ViTamin architecture, designed for efficient visual feature extraction and multimodal tasks, supporting high-resolution image processing.

Vitamin XL 384px

ViTamin-XL-384px is a large-scale vision-language model based on the ViTamin architecture, specifically designed for vision-language tasks, supporting high-resolution image processing and multimodal feature extraction.

Siglip So400m 14 980 Flash Attn2 Navit

SigLIP-based vision model that enhances maximum resolution to 980x980 through interpolated positional embeddings and implements NaViT strategy for variable resolution and aspect ratio-preserving image processing

Sdxl Instructpix2pix 768

An image editing model fine-tuned on Stable Diffusion XL (SDXL) using the InstructPix2Pix method, supporting image editing through natural language instructions.

Image Generation

Segformer B5 Finetuned Ade 640 640

SegFormer is a Transformer-based semantic segmentation model fine-tuned on the ADE20k dataset, suitable for image segmentation tasks.

Image Segmentation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase